On block-asynchronous execution on GPUs

نویسندگان

Hartwig Anzt

Jack Dongarra

Edmond Chow

چکیده

This paper experimentally investigates how GPUs execute instructions when used for general purpose computing (GPGPU). We use a light-weight realizing a vector operation to analyze which vector entries are updated subsequently, and identify regions where parallel execution can be expected. The results help us to understand how GPUs operate, and map this operation mode to the mathematical concept of asynchronism. In particular it helps to understand the effects that can occur when implementing a fixed-point method using in-place updates on GPU hardware. Keywords-GPU-computing, asynchronous execution, blockasynchronous iteration

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures

We address some key issues in designing dense linear algebra (DLA) algorithms that are common for both multi/many-cores and special purpose architectures (in particular GPUs). We present them in the context of an LU factorization algorithm, where randomization techniques are used as an alternative to pivoting. This approach yields an algorithm based entirely on a collection of small Level 3 BLA...

متن کامل

Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems

In this paper, we analyze the potential of using weights for block-asynchronous relaxation methods on GPUs. For this purpose, we introduce different weighting techniques similar to those applied in blocksmoothers for multigrid methods. For test matrices taken from the University of Florida Matrix Collection we report the convergence behavior and the total runtime for the different techniques. A...

متن کامل

Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs

This paper presents a GPU implementation of an asynchronous iterative algorithm for computing incomplete factorizations. Asynchronous algorithms, with their ability to tolerate memory latency, form an important class of algorithms for modern computer architectures. Our GPU implementation considers several non-traditional techniques that can be important for asynchronous algorithms to optimize c...

متن کامل

Weighted Block - Asynchronous Relaxation for Gpu - Accelerated Systems ∗

In this paper, we analyze the potential of using weights for block-asynchronous relaxation methods on GPUs. For this purpose, we introduce different weighting techniques similar to those applied in block-smoothers for multigrid methods. Having proven a sufficient convergence condition for the weighted block-asynchronous iteration, we analyze the performance of the algorithms implemented using C...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

On block-asynchronous execution on GPUs

نویسندگان

چکیده

منابع مشابه

Accelerating high-order WENO schemes using two heterogeneous GPUs

Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures

Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems

Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs

Weighted Block - Asynchronous Relaxation for Gpu - Accelerated Systems ∗

عنوان ژورنال:

اشتراک گذاری